SKU forecasting has been used extensively by retail chains even before the emergence of the field of analytics and is an important aspect of the retail business. Forecasting and Inventory optimization is successful for especially products with high demand and high shelf life. Over the period of times, the type of products available to the customer has increased exponentially . Few products are fast moving and other are relatively slow moving .Forecasting sales for slow moving SKU’s has been found to be quite difficult. The problem is furthur aggravted if the SKU’s have lower shelf life (perishable goods like fruits and vegetables) and are slow moving ( goods that serve a niche segment of customer or are bought occasionally)
The motivation of this project lies in building a scalable forecasting system which can forecast sales of all SKU’s including low shelf life - slow moving goods with a reasonable accuracy . Value lies in creating a dashboard which could be used directly by the businesses even with little knowledge of the underlying methodology.
The dataset is a data of a popular Indian retailer . The cadence of the data is at a daily level and the time period used for analysis is 4 months. The data has total of 968 SKU’s and almost all of them come under perishable goods. It has both slow moving as well as fast moving products.
## Date APPLE.FUJI.LOOSE APPLE.RED.DEL..MEDIUM BANANA.POOVAN
## 1 7/1/2014 3.89 0.00 1.70
## 2 7/2/2014 3.83 0.22 3.78
## 3 7/3/2014 1.04 0.64 2.17
## 4 7/4/2014 3.06 0.00 1.33
## 5 7/5/2014 3.76 0.67 3.59
## 6 7/6/2014 3.45 0.00 2.84
## CUCUMBER.HYBRID..MEDIUM...VWT TOMATO TOMATO.HYBRID ONION
## 1 2.90 7.56 7.37 15.95
## 2 2.42 10.18 10.99 10.72
## 3 2.60 6.48 13.20 15.37
## 4 2.40 9.89 10.71 11.89
## 5 2.23 10.77 23.68 25.25
## 6 3.71 12.72 26.27 27.50
## Date APPLE.FUJI.LOOSE APPLE.RED.DEL..MEDIUM BANANA.POOVAN
## Length:92 Min. :0.000 Min. : 0.000 Min. :0.390
## Class :character 1st Qu.:0.000 1st Qu.: 0.280 1st Qu.:1.792
## Mode :character Median :0.535 Median : 1.170 Median :2.805
## Mean :1.345 Mean : 1.747 Mean :2.877
## 3rd Qu.:2.245 3rd Qu.: 2.692 3rd Qu.:3.800
## Max. :8.090 Max. :10.500 Max. :6.910
## CUCUMBER.HYBRID..MEDIUM...VWT TOMATO TOMATO.HYBRID
## Min. :0.000 Min. : 1.670 Min. : 0.470
## 1st Qu.:1.817 1st Qu.: 6.683 1st Qu.: 6.053
## Median :2.600 Median : 9.585 Median : 8.485
## Mean :3.024 Mean :10.314 Mean : 9.216
## 3rd Qu.:3.805 3rd Qu.:12.725 3rd Qu.:11.585
## Max. :8.390 Max. :40.750 Max. :26.270
## ONION
## Min. : 0.000
## 1st Qu.: 6.952
## Median : 9.340
## Mean :11.361
## 3rd Qu.:13.143
## Max. :40.740
Looks like there are just too many SKU’s and we should select few SKU’s with both filled as well as sparse sales so that we can compare different methods on them
As we can see from the above plot that sales behavior for each SKU isvery different from each other, Popular fruits like tomato and vegetables like Onion have more continuous trends . However, we see that for few varieties of Apple which are not so commonly sold we see that there are multiple days when sale is zero . Such frequent occurances of zeroes indicate that the product is not a regular purchase SKU and hence the forecasting becomes quite challenging for such SKU’s. Retailers face huge loss as sales cannot be forecasted accurately for these SKU’s and it leads to loss either due to unavailability or waste of SKU’s.
Let’s see how the current sales of each SKU is related to previous time periods
Lag plot is one way of doing it . However, it doesn’t give any actionable evidence It is just a visual way of looking at lagged correlation. We will just plot one for example.
lag.plot(data1$ONION, lags=9, do.lines=FALSE)
On the other hand we will see that ACF gives us more actionable evidence
As we see from ACF plots , for few SKU’s first lag is significant for others no lag is significant and for few lag other tha 1 is significant. Our goal here is to build a scalable forecasting system which can provide weekly/daily forecasts for all 1000 SKU’s. Fitting individual models may not be scalable so we can use Auto Arima function in R . Before that we see more basic time series techniques and evaluate their performance
Let’s first decompose the series into its components to understand each of them. This is done using STL decomposition in R that uses loess smoothing to estimate seasonal component.
Let us first try smoothing techniques . The most popular one which has been used for forecasting purposes in the past is the Holt winter’s smoothing . This adaptively smoothes the series .
The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations - one for the level \(l_t\), one for trend \(b_t\) and one for the seasonal component denoted by \(s_t\), with smoothing parameters \(\alpha\), \(\beta\) and \(\gamma\) . We use \(m\) to denote the period of the seasonality, i.e., the number of seasons in a year.
\[\begin{align*} \hat{y}{t+h} &= \ell_{t} + hb_{t} + s_{t-m+h_m^+} \\ \ell_{t} &= \alpha(y_{t} - s_{t-m}) + (1 - \alpha)(\ell_{t-1} + b_{t-1})\\ b_{t} &= \beta^*(\ell_{t} - \ell_{t-1}) + (1 - \beta^*)b_{t-1}\\ s_{t} &= \gamma (y_{t}-\ell_{t-1}-b_{t-1}) + (1-\gamma)s_{t-m}, \end{align*}\]
Now let us check the accuracy of forecasts by holt winters seasonal smoothing method
## Accuracy SKU
## 1 0.632820156987846 APPLE.FUJI.LOOSE
## 2 0.692862944829418 APPLE.RED.DEL..MEDIUM
## 3 0.795209184636986 BANANA.POOVAN
## 4 0.631703193694769 CUCUMBER.HYBRID..MEDIUM...VWT
## 5 0.712608916664767 TOMATO
## 6 0.774961146980696 TOMATO.HYBRID
## 7 0.694278355486448 ONION
We see that the accuracy is not great for some species and it can be easily seen from the forecasted and actual value plots. Holt winters smoothing is a adaptive smoothing method and hence does better than pre specified parametric ARMA models. However, there are abrupt peks inthe data which the smoother is not able to learn . This provides us hints for using more advanced time series odels that are adaptive in nature . Another reason for using adaptive methods is to create scalable forecasting systems for multiple SKU’s without fitting parametric models for each of them .
The basic stuctural model supposes that the observation process is the sum of a level L, a trend T describing the rate of change of the level, and a monthly seasonal component S. The model supposes that all these quantities are perturbed with Gaussian white noise at each time point. So, we have the following model equations
\[ \begin{array}{lrcl} \mbox{[BSM1]} {\quad\quad}& Y_n &=& L_n + S_n + \epsilon_n \\ \mbox{[BSM2]} {\quad\quad}& L_{n} &=& L_{n-1} + T_{n-1} + \xi_n \\ \mbox{[BSM3]} {\quad\quad}& T_{n} &=& T_{n-1} + \zeta_n \\ \mbox{[BSM4]} {\quad\quad}& S_{n} &=& -\sum_{k=1}^{11} S_{n-k} + \eta_n \end{array} \] (Adapted from Ed Ionides’s Github page)
We see from the plots that the holt winter’s method couldn’t predict sudden peaks but Structural model does a great job in predicting the abrupt values .
Let us compare the Mean Absolute error for both the techniques
| SKU | Holt winter’s Error | Structural model Error |
|---|---|---|
| APPLE.FUJI.LOOSE | 0.897190083748298 | 0.654521509857477 |
| APPLE.RED.DEL..MEDIUM | 1.37977541966207 | 0.74757538463972 |
| BANANA.POOVAN | 1.02338744361694 | 0.816728905514546 |
| CUCUMBER.HYBRID..MEDIUM…VWT | 1.26296047925281 | 0.773077321219123 |
| ONION | 4.90029831282988 | 0.709230939929273 |
| TOMATO | 3.69852411383328 | 0.715118372132437 |
| TOMATO.HYBRID | 3.34646457646193 | 0.782557843516294 |
We see that basic structural time series model has far better average accuracy and low error as compared to the holt winter’s seasonal smoothing method.
As we saw in the analysis above , Basic structural time series model fits the data quite well as compared to conventional method of Holt winters which is quite commonly used in predicting daily sales . Both are adaptive in nature but the adaptive power of Structural time series model surpasses that of holt winter’s seasonal method.
The prediction is almost accurate for SKU’s which are fast moving , like Tomatoes , Onions and Banana . However as we can see few SKU’s that are not bought everyday and have zero values for significant number of days have poor accuracy with both the models.
The goal for the final Project would be to come up with methodology for predicting sales for sparse sales SKU’s .Potential ethods could be advance structural time series models and zero inflated models.
https://www.otexts.org/fpp http://www.supplychainbrain.com/content/latest-content/single-article/article/in-the-world-of-perishables-forecast-accuracy-is-key/
To build the User interface / Forecasting dashboard R Shiny is being used . This is still work in progress but the basic version of UI works fine . Here are few snapshots of the same.